Mako hSDM BRT explore (CRW PAs)

Author

Emily Nazario

Published

June 25, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using correlated random walk approaches, but another quarto document includes models with background sampling pseudo absences. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

base_Nspat_Ntag <- readRDS(brt_outputs[7])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.5736928
Correlation        0.8285057
AUC                0.9664000
Per.Expl          58.6164539
cvDeviance         0.8622091
cvCorrelation      0.6620174
cvAUC              0.8766500
cvPer.Expl        37.8042183
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ntag) 

             rel.inf
bathy_mean 24.425351
temp_mean  18.087655
sal_mean   10.919911
chl_mean    9.336627
vostr_mean  6.686538
ssh_mean    6.168030
mld_mean    6.048063
bathy_sd    5.587410
uo_mean     3.417308
vo_mean     3.364026
uostr_mean  3.297190
pred_var    2.661890
#explore partial plots
gbm.plot(base_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ntag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   952.17
2          3   sal_mean          2  temp_mean   710.45
3          8   ssh_mean          2  temp_mean   512.92
4          6    vo_mean          3   sal_mean   417.59
5         10 bathy_mean          3   sal_mean   368.97
6          2  temp_mean          1   chl_mean   365.72
7         10 bathy_mean          8   ssh_mean   278.76
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ntag, dat_test_base,
                     n.trees = base_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.6336927
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7232538
max(e@TPR + e@TNR -1) #TSS
[1] 0.7546782
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ntag)
[1] 58.61645
#eval 75/25
eval_7525_modified(base_Nspat_Ntag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3096193 0.7932161 0.9479571 0.9934021         0.5428799 0.5861645
base_Nspat_Ytag <- readRDS(brt_outputs[8])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.2305879
Correlation        0.9503452
AUC                0.9979000
Per.Expl          83.3664540
cvDeviance         0.4538959
cvCorrelation      0.8528183
cvAUC              0.9680300
cvPer.Expl        67.2580483
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ytag) 

              rel.inf
tag        44.1894066
bathy_mean 17.5965698
temp_mean  11.0672136
sal_mean    6.8107330
chl_mean    4.6665128
ssh_mean    3.6780850
vostr_mean  3.5179667
mld_mean    2.5624157
bathy_sd    1.8533773
uostr_mean  1.2374256
vo_mean     1.1506546
uo_mean     1.0981552
pred_var    0.5714842
#explore partial plots
gbm.plot(base_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ytag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          4   sal_mean          1        tag  1496.60
2          3  temp_mean          1        tag  1036.49
3         11 bathy_mean          1        tag   988.07
4          2   chl_mean          1        tag   512.43
5          9   ssh_mean          1        tag   369.40
6         11 bathy_mean          3  temp_mean   274.54
7          8 vostr_mean          1        tag   231.62
8         10   mld_mean          1        tag   207.08
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ytag, dat_test_base,
                     n.trees = base_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.2859709
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7452783
max(e@TPR + e@TNR -1) #TSS
[1] 0.919968
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ytag)
[1] 83.36645
#eval 75/25
eval_7525_modified(base_Nspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1923806 0.9265175 0.9917029 0.9942702         0.7937122 0.8336645
base_Yspat_Ytag <- readRDS(brt_outputs[9])

# model performance 
ggBRT::ggPerformance(base_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.1773819
Correlation        0.9648516
AUC                0.9992000
Per.Expl          87.2044910
cvDeviance         0.3814392
cvCorrelation      0.8803459
cvAUC              0.9770700
cvPer.Expl        72.4847405
#relative influence of predictors
ggBRT::ggInfluence(base_Yspat_Ytag) 

              rel.inf
tag        40.2206057
dist_coast 24.0608862
lat         9.4741336
temp_mean   4.9067106
bathy_mean  4.6525961
sal_mean    4.4004394
chl_mean    3.5004953
vostr_mean  2.4769759
ssh_mean    1.7393792
mld_mean    1.7041726
bathy_sd    0.8436843
uo_mean     0.5521552
vo_mean     0.5356691
uostr_mean  0.5120463
pred_var    0.4200505
#explore partial plots
gbm.plot(base_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Yspat_Ytag)
base_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   883.10
2           5   sal_mean          1        tag   512.27
3           4  temp_mean          1        tag   484.21
4           3   chl_mean          1        tag   478.25
5          14 dist_coast          1        tag   452.76
6          12 bathy_mean          1        tag   425.15
7          11   mld_mean          1        tag   201.34
8          14 dist_coast          2        lat   160.15
9          14 dist_coast          4  temp_mean   153.48
10         10   ssh_mean          1        tag   149.67
11         15   pred_var          1        tag   139.75
#predictive performance using test dataset 
preds <- predict.gbm(base_Yspat_Ytag, dat_test_base,
                     n.trees = base_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.2323732
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7467905
max(e@TPR + e@TNR -1) #TSS
[1] 0.935647
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Yspat_Ytag)
[1] 87.20449
#eval 75/25
eval_7525_modified(base_Yspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
10000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1713338 0.9418611 0.9946135 0.9957419         0.8323754 0.8720449

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

do_0m_Nspat_Ytag <- readRDS(brt_outputs[14])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3408774
Correlation        0.9058550
AUC                0.9897000
Per.Expl          75.4108818
cvDeviance         0.5312851
cvCorrelation      0.8146489
cvAUC              0.9559100
cvPer.Expl        61.6758566
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ytag) 

              rel.inf
tag        40.9080062
bathy_mean 23.7019091
o2_mean_0m 14.1885697
temp_mean   4.2249870
chl_mean    3.4047577
sal_mean    2.9679120
ssh_mean    2.7533054
vostr_mean  1.6135297
mld_mean    1.4718496
bathy_sd    1.0656363
pred_var    1.0330993
uostr_mean  0.9371363
vo_mean     0.9056197
uo_mean     0.8236821
#explore partial plots
gbm.plot(do_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12 bathy_mean          1        tag  1100.34
2           2 o2_mean_0m          1        tag   577.24
3           5   sal_mean          1        tag   519.34
4           4  temp_mean          1        tag   249.16
5          10   ssh_mean          1        tag   212.28
6          12 bathy_mean          4  temp_mean   154.44
7           3   chl_mean          1        tag   153.94
8          13   bathy_sd          1        tag   126.85
9          11   mld_mean          1        tag   117.52
10         14   pred_var          1        tag   104.95
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3932422
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7399579
max(e@TPR + e@TNR -1) #TSS
[1] 0.8480786
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ytag)
[1] 75.41088
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2392778 0.8810744 0.9811385 0.9981726           0.71633 0.7541088
do_0m_Yspat_Ytag <- readRDS(brt_outputs[15])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3169398
Correlation        0.9131868
AUC                0.9912000
Per.Expl          77.1376144
cvDeviance         0.4910827
cvCorrelation      0.8306125
cvAUC              0.9627200
cvPer.Expl        64.5758521
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ytag) 

              rel.inf
tag        39.3243042
dist_coast 23.4675901
o2_mean_0m  9.2630160
bathy_mean  7.6602469
lat         7.4436658
temp_mean   2.1578941
chl_mean    2.1270429
sal_mean    2.0922701
ssh_mean    1.4504847
mld_mean    1.0484217
pred_var    0.9808561
vostr_mean  0.7765505
bathy_sd    0.6166353
uostr_mean  0.5730787
vo_mean     0.5270670
uo_mean     0.4908760
#explore partial plots
gbm.plot(do_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   537.19
2          13 bathy_mean          1        tag   441.38
3           3 o2_mean_0m          1        tag   378.08
4          15 dist_coast          1        tag   190.11
5           6   sal_mean          1        tag   188.58
6           4   chl_mean          1        tag   144.71
7          11   ssh_mean          1        tag   112.11
8           5  temp_mean          1        tag    96.88
9          16   pred_var          1        tag    90.22
10         14   bathy_sd          1        tag    83.50
11          3 o2_mean_0m          2        lat    75.98
12         12   mld_mean          1        tag    69.54
13          5  temp_mean          2        lat    62.79
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.366281
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.741393
max(e@TPR + e@TNR -1) #TSS
[1] 0.8580333
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ytag)
[1] 77.13761
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2303471 0.8902957 0.9839578 0.9975502         0.7357788 0.7713761
do_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[13])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3275756
Correlation        0.9101734
AUC                0.9904000
Per.Expl          76.3704047
cvDeviance         0.5211133
cvCorrelation      0.8185662
cvAUC              0.9575900
cvPer.Expl        62.4096016
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ytag) 

               rel.inf
tag         39.7630992
bathy_mean  22.3644729
o2_mean_0m  13.1609254
o2_mean_60m  5.7538384
chl_mean     3.4719050
temp_mean    3.0814247
sal_mean     2.8796698
ssh_mean     2.4945921
mld_mean     1.4530477
vostr_mean   1.2564378
pred_var     1.0243164
bathy_sd     0.9583521
vo_mean      0.8437129
uostr_mean   0.8075706
uo_mean      0.6866347
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ytag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1          12  bathy_mean          1        tag  1007.87
2           2  o2_mean_0m          1        tag   483.04
3           5    sal_mean          1        tag   376.66
4          14 o2_mean_60m          1        tag   244.01
5           4   temp_mean          1        tag   223.33
6          10    ssh_mean          1        tag   169.05
7           3    chl_mean          1        tag   158.22
8          13    bathy_sd          1        tag   133.89
9          12  bathy_mean          4  temp_mean    98.89
10         11    mld_mean          1        tag    89.75
11          7  uostr_mean          1        tag    85.56
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3819614
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7404437
max(e@TPR + e@TNR -1) #TSS
[1] 0.8547024
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ytag)
[1] 76.3704
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2355947 0.8847726 0.9820592 0.9988765         0.7244676 0.763704
do_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[10])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3181841
Correlation        0.9124632
AUC                0.9911000
Per.Expl          77.0478579
cvDeviance         0.5109006
cvCorrelation      0.8216070
cvAUC              0.9592600
cvPer.Expl        63.1462878
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ytag) 

                rel.inf
tag          39.6887396
o2_mean_0m   16.8057119
o2_mean_250m 16.3627195
bathy_mean   11.6711360
temp_mean     2.7622574
chl_mean      2.2458410
sal_mean      2.1797729
ssh_mean      1.9088688
pred_var      1.1660577
mld_mean      1.0505143
bathy_sd      0.9624893
vostr_mean    0.9540213
uostr_mean    0.7842279
uo_mean       0.7318377
vo_mean       0.7258045
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          1        tag   624.45
2           2   o2_mean_0m          1        tag   515.76
3          14 o2_mean_250m          1        tag   451.56
4           5     sal_mean          1        tag   437.13
5          14 o2_mean_250m          2 o2_mean_0m   206.71
6           3     chl_mean          1        tag   190.92
7          10     ssh_mean          1        tag   169.35
8           4    temp_mean          1        tag   130.26
9          13     bathy_sd          1        tag   108.91
10          5     sal_mean          2 o2_mean_0m   103.02
11         15     pred_var          1        tag    95.95
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3744491
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7407434
max(e@TPR + e@TNR -1) #TSS
[1] 0.8573576
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ytag)
[1] 77.04786
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2338012 0.8864079 0.9826879 0.9979236         0.7298866 0.7704786
do_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3191965
Correlation        0.9122045
AUC                0.9909000
Per.Expl          76.9748229
cvDeviance         0.5103819
cvCorrelation      0.8220382
cvAUC              0.9594100
cvPer.Expl        63.1837098
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ytag) 

                rel.inf
tag          38.7855718
o2_mean_0m   16.7326988
o2_mean_250m 15.8676740
bathy_mean   11.1462711
o2_mean_60m   3.6391761
temp_mean     2.3942027
sal_mean      2.1301516
chl_mean      2.0440835
ssh_mean      1.6371297
pred_var      1.0443052
mld_mean      0.9945159
vostr_mean    0.8817324
bathy_sd      0.8374595
uostr_mean    0.7158327
vo_mean       0.6096311
uo_mean       0.5395639
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          1        tag   569.59
2          15 o2_mean_250m          1        tag   463.02
3           2   o2_mean_0m          1        tag   458.51
4           5     sal_mean          1        tag   328.66
5           4    temp_mean          1        tag   178.30
6           3     chl_mean          1        tag   175.45
7          14  o2_mean_60m          1        tag   168.87
8          10     ssh_mean          1        tag   147.64
9          15 o2_mean_250m          2 o2_mean_0m   141.39
10         13     bathy_sd          1        tag   101.16
11         16     pred_var          1        tag    73.16
12         11     mld_mean          1        tag    72.52
13          7   uostr_mean          1        tag    68.16
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.375977
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7405948
max(e@TPR + e@TNR -1) #TSS
[1] 0.8528994
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ytag)
[1] 76.97482
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE     Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2347064 0.88541 0.9824251 0.9994384         0.7287845 0.7697482
do_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[12])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862935
Residual.Deviance  0.3011707
Correlation        0.9188137
AUC                0.9925000
Per.Expl          78.2751108
cvDeviance         0.4820971
cvCorrelation      0.8345785
cvAUC              0.9641300
cvPer.Expl        65.2240256
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ytag) 

                rel.inf
tag          37.5967831
dist_coast   20.6808825
o2_mean_0m   10.5992826
o2_mean_250m  8.5196739
lat           4.6145739
bathy_mean    3.8293017
o2_mean_60m   2.9614725
temp_mean     1.9990685
chl_mean      1.8128857
sal_mean      1.6521470
ssh_mean      1.2297337
pred_var      1.0165713
mld_mean      0.8872307
bathy_sd      0.6138532
uostr_mean    0.5819161
vostr_mean    0.4989877
vo_mean       0.4618753
uo_mean       0.4437606
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           2          lat          1        tag   367.17
2          13   bathy_mean          1        tag   301.62
3           3   o2_mean_0m          1        tag   273.23
4          17 o2_mean_250m          1        tag   253.83
5           6     sal_mean          1        tag   176.41
6           4     chl_mean          1        tag   142.58
7          15   dist_coast          1        tag   137.30
8          11     ssh_mean          1        tag   106.92
9          16  o2_mean_60m          1        tag   100.63
10          5    temp_mean          1        tag    76.80
11         18     pred_var          1        tag    74.23
12         14     bathy_sd          1        tag    69.96
13         12     mld_mean          1        tag    67.22
14          9      vo_mean          1        tag    54.84
15         10   vostr_mean          1        tag    49.70
16         17 o2_mean_250m          3 o2_mean_0m    40.74
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3562514
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7417819
max(e@TPR + e@TNR -1) #TSS
[1] 0.8645261
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ytag)
[1] 78.27511
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2273319 0.893137 0.9847825 0.9983867         0.7430137 0.7827511

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

agi_0m_Nspat_Ytag <- readRDS(brt_outputs[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3294417
Correlation        0.9120135
AUC                0.9914000
Per.Expl          76.2356104
cvDeviance         0.5331603
cvCorrelation      0.8132824
cvAUC              0.9557600
cvPer.Expl        61.5402963
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ytag) 

              rel.inf
tag        41.4342777
bathy_mean 22.5768731
temp_mean   8.8146014
AGI_0m      5.3409758
sal_mean    4.3539170
ssh_mean    4.0682555
chl_mean    3.4388349
vostr_mean  2.4504783
uostr_mean  1.5708916
mld_mean    1.3780963
bathy_sd    1.3161888
vo_mean     1.2188877
pred_var    1.0875914
uo_mean     0.9501306
#explore partial plots
gbm.plot(agi_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          11 bathy_mean          1        tag   803.70
2           4   sal_mean          1        tag   650.52
3          13     AGI_0m          3  temp_mean   565.85
4           3  temp_mean          1        tag   323.97
5           9   ssh_mean          1        tag   312.45
6          11 bathy_mean          3  temp_mean   257.21
7          13     AGI_0m          1        tag   254.47
8          12   bathy_sd          1        tag   207.39
9           2   chl_mean          1        tag   203.19
10         10   mld_mean          1        tag   153.81
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3771666
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7412855
max(e@TPR + e@TNR -1) #TSS
[1] 0.8659124
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ytag)
[1] 76.23561
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2314333 0.890198 0.9838225  1.000695         0.7279205 0.7623561
agi_0m_Yspat_Ytag <- readRDS(brt_outputs[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3079406
Correlation        0.9177214
AUC                0.9925000
Per.Expl          77.7865966
cvDeviance         0.4916562
cvCorrelation      0.8299703
cvAUC              0.9625900
cvPer.Expl        64.5342064
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ytag) 

              rel.inf
tag        39.1704772
dist_coast 22.9067106
lat         9.2973311
bathy_mean  7.5281540
temp_mean   4.6964781
AGI_0m      4.1053374
sal_mean    2.6165565
chl_mean    2.4600045
ssh_mean    1.8019982
pred_var    1.1308360
mld_mean    0.9622477
vostr_mean  0.7571586
bathy_sd    0.7305617
uostr_mean  0.6250570
uo_mean     0.6156725
vo_mean     0.5954189
#explore partial plots
gbm.plot(agi_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   646.34
2          12 bathy_mean          1        tag   375.27
3           5   sal_mean          1        tag   260.79
4           3   chl_mean          1        tag   187.34
5           4  temp_mean          1        tag   176.90
6          15 dist_coast          1        tag   162.12
7          14     AGI_0m          1        tag   154.37
8          13   bathy_sd          1        tag   134.66
9          16   pred_var          1        tag   101.32
10         12 bathy_mean          4  temp_mean    99.58
11         10   ssh_mean          1        tag    92.42
12         11   mld_mean          1        tag    67.39
13          9 vostr_mean          1        tag    61.50
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3514956
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7425883
max(e@TPR + e@TNR -1) #TSS
[1] 0.8786994
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ytag)
[1] 77.7866
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6350 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2229464 0.8984309 0.9864263  1.002449          0.746439 0.777866
agi_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3298267
Correlation        0.9115959
AUC                0.9912000
Per.Expl          76.2078336
cvDeviance         0.5248204
cvCorrelation      0.8175566
cvAUC              0.9574200
cvPer.Expl        62.1418985
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ytag) 

              rel.inf
tag        41.1130461
bathy_mean 22.1175206
temp_mean   8.8167602
AGI_0m      5.1923827
sal_mean    3.8146119
AGI_60m     3.5914584
ssh_mean    3.3646809
chl_mean    2.8805052
vostr_mean  2.2362877
mld_mean    1.3300028
uostr_mean  1.2641285
vo_mean     1.2078786
bathy_sd    1.1634177
pred_var    1.0421567
uo_mean     0.8651621
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          11 bathy_mean          1        tag   680.43
2          13     AGI_0m          3  temp_mean   677.26
3           4   sal_mean          1        tag   537.76
4           3  temp_mean          1        tag   412.50
5          14    AGI_60m          1        tag   247.42
6          13     AGI_0m          1        tag   229.24
7           9   ssh_mean          1        tag   222.42
8          11 bathy_mean          3  temp_mean   187.87
9          12   bathy_sd          1        tag   151.81
10         10   mld_mean          1        tag   151.59
11          2   chl_mean          1        tag   146.54
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3767186
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7412248
max(e@TPR + e@TNR -1) #TSS
[1] 0.8623808
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ytag)
[1] 76.20783
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2313699 0.8901793 0.9837147  1.000391         0.7282437 0.7620783
agi_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3250560
Correlation        0.9119144
AUC                0.9912000
Per.Expl          76.5519740
cvDeviance         0.5193933
cvCorrelation      0.8180693
cvAUC              0.9578900
cvPer.Expl        62.5333843
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ytag) 

              rel.inf
tag        40.4877590
AGI_250m   16.0481464
bathy_mean 11.8550705
temp_mean  11.3274832
AGI_0m      4.0294752
sal_mean    3.1191434
ssh_mean    2.8171750
chl_mean    2.4561101
vostr_mean  1.7373119
bathy_sd    1.1323632
mld_mean    1.0927830
uostr_mean  1.0841796
pred_var    1.0292176
vo_mean     1.0064972
uo_mean     0.7772848
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag   628.92
2          11 bathy_mean          1        tag   513.82
3          14   AGI_250m          1        tag   395.22
4           3  temp_mean          1        tag   374.74
5           2   chl_mean          1        tag   242.62
6           9   ssh_mean          1        tag   218.77
7          13     AGI_0m          3  temp_mean   203.96
8          13     AGI_0m          1        tag   202.13
9          14   AGI_250m          3  temp_mean   136.10
10         12   bathy_sd          1        tag   120.44
11         15   pred_var          1        tag   112.61
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.370768
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.741587
max(e@TPR + e@TNR -1) #TSS
[1] 0.864793
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ytag)
[1] 76.55197
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2300722 0.891287 0.9844429  1.001254         0.7325363 0.7655197
agi_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.3194562
Correlation        0.9142194
AUC                0.9917000
Per.Expl          76.9559147
cvDeviance         0.5170729
cvCorrelation      0.8191713
cvAUC              0.9583200
cvPer.Expl        62.7007660
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ytag) 

              rel.inf
tag        39.9057995
AGI_250m   15.9521143
bathy_mean 11.5951719
temp_mean  11.0079998
AGI_0m      3.9165535
sal_mean    2.9778595
AGI_60m     2.5421021
ssh_mean    2.3136229
chl_mean    2.1326512
vostr_mean  1.7126189
uostr_mean  1.2278622
bathy_sd    1.0302718
mld_mean    1.0176906
pred_var    1.0118376
vo_mean     0.9572175
uo_mean     0.6986267
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag   598.01
2          11 bathy_mean          1        tag   474.54
3          15   AGI_250m          1        tag   312.33
4           3  temp_mean          1        tag   311.72
5           2   chl_mean          1        tag   242.05
6          14    AGI_60m          1        tag   224.33
7           9   ssh_mean          1        tag   200.59
8          13     AGI_0m          1        tag   191.78
9          13     AGI_0m          3  temp_mean   179.03
10         12   bathy_sd          1        tag   122.19
11         15   AGI_250m          3  temp_mean   119.14
12          8 vostr_mean          1        tag    81.82
13         16   pred_var          1        tag    80.72
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3659758
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7417961
max(e@TPR + e@TNR -1) #TSS
[1] 0.8650836
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ytag)
[1] 76.95591
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
    RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2283 0.8930857 0.9848702  1.000961         0.7359933 0.7695591
agi_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ytag)
                     Model 1
Total.Deviance     1.3862829
Residual.Deviance  0.2996914
Correlation        0.9203897
AUC                0.9931000
Per.Expl          78.3816567
cvDeviance         0.4880000
cvCorrelation      0.8312630
cvAUC              0.9631300
cvPer.Expl        64.7979495
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ytag) 

              rel.inf
tag        38.8772398
dist_coast 19.6961943
lat         9.2401730
AGI_250m    7.9838041
bathy_mean  4.2759852
temp_mean   4.1810197
AGI_0m      3.2807915
AGI_60m     2.2096760
sal_mean    2.1899170
chl_mean    2.0352351
ssh_mean    1.2407309
pred_var    1.0135341
mld_mean    0.9384425
bathy_sd    0.6260702
vo_mean     0.5836817
uostr_mean  0.5717539
vostr_mean  0.5376097
uo_mean     0.5181415
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   554.02
2          17   AGI_250m          1        tag   272.42
3          12 bathy_mean          1        tag   268.95
4           5   sal_mean          1        tag   205.22
5           3   chl_mean          1        tag   175.43
6          16    AGI_60m          1        tag   134.83
7           4  temp_mean          1        tag   134.10
8          15 dist_coast          1        tag   126.25
9          14     AGI_0m          1        tag   118.51
10         18   pred_var          1        tag    91.28
11         13   bathy_sd          1        tag    73.70
12         10   ssh_mean          1        tag    62.40
13         11   mld_mean          1        tag    54.65
14          8    vo_mean          1        tag    53.91
15         17   AGI_250m          2        lat    44.17
16          9 vostr_mean          1        tag    40.01
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3450229
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7428373
max(e@TPR + e@TNR -1) #TSS
[1] 0.8795687
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ytag)
[1] 78.38166
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6250 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2208312 0.9003157 0.9869252   1.00278         0.7511082 0.7838166

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_crw_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE Cor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 58.62 0.634 0.723 0.7550 0.948 0.3096 0.793 0.948 0.9923 0.586
base_0m_Nspat_Ytag 83.37 0.286 0.745 0.9199 0.992 0.1920 0.926 0.992 0.9940 0.834
base_0m_Yspat_Ytag 87.20 0.232 0.747 0.9360 0.995 0.1710 0.942 0.995 0.9960 0.872
do_0m_Nspat_Ytag 75.41 0.393 0.740 0.8480 0.981 0.2400 0.881 0.981 0.9980 0.754
do_0m_Yspat_Ytag 77.14 0.366 0.741 0.8580 0.984 0.2300 0.890 0.984 0.9970 0.771
do_0m_60m_Nspat_Ytag 76.37 0.382 0.749 0.8550 0.982 0.2360 0.885 0.982 0.9990 0.764
do_0m_250m_Nspat_Ytag 77.05 0.374 0.741 0.8570 0.983 0.2340 0.886 0.983 0.9980 0.770
do_0m_60m_250m_Nspat_Ytag 76.97 0.376 0.741 0.8530 0.882 0.2350 0.885 0.982 0.9990 0.770
do_0m_60m_250m_Yspat_Ytag 78.28 0.356 0.742 0.8650 0.985 0.2270 0.893 0.985 0.9980 0.783
agi_0m_Nspat_Ytag 76.24 0.377 0.741 0.8660 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_Yspat_Ytag 77.78 0.351 0.743 0.8780 0.986 0.2230 0.898 0.986 1.0020 0.778
agi_0m_60m_Nspat_Ytag 76.21 0.377 0.741 0.8620 0.984 0.2310 0.890 0.984 1.0000 0.762
agi_0m_250m_Nspat_Ytag 76.55 0.371 0.742 0.8650 0.984 0.2300 0.891 0.984 1.0000 0.765
agi_0m_60m_250m_Nspat_Ytag 76.96 0.366 0.742 0.8650 0.985 0.2280 0.893 0.985 1.0000 0.770
agi_0m_60m_250m_Yspat_Ytag 78.38 0.345 0.743 0.8790 0.987 0.2210 0.900 0.987 1.0000 0.784